Combining POS-taggers for improved accuracy on Swedish text
نویسنده
چکیده
Several POS-taggers are trained and tested on Swedish text. Methods to improve the accuracy of the tagging are then examined. These methods include voting, letting taggers change their voting contribution depending on how confident they are and training a new second level classifier on the output of the taggers. All these methods are more accurate than the most accurate original tagger, with 15% less errors or better. Which types of errors these methods correct and which types remain are also examined. The number of errors in some common error categories actually increase, while many uncommon errors are corrected.
منابع مشابه
Combining Pos Taggers for Improved Accuracy to Create Telugu Annotated Texts for Information Retrieval
POS Tagging is the process of assigning a correct POS tag (can be a noun, verb, adjective, adverb, or other lexical category marker) to each word of the sentence. POS taggers are developed by modeling the morpho-syntactic structure of natural language text. We attempted to improve the accuracy of existing Telugu POS taggers by using an voting algorithm. The three Telugu Pos taggers viz., (1) Ru...
متن کاملBig is beautiful Bootstrapping a PoS tagger for Swedish
A statistical part-of-speech tagger trained on a one-million word Swedish corpus with validated tags was used to tag two considerably larger untagged corpora (≈ 78 and 20 million words, respectively) to bootstrap new, improved, tagger models. The new taggers all showed better accuracy both for seen and unseen words, and the best tagger had 97.02% overall accuracy evaluated on the original corpu...
متن کاملسیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی
Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...
متن کاملPoliTa: A multitagger for Polish
Part-of-Speech (POS) tagging is a crucial task in Natural Language Processing (NLP). POS tags may be assigned to tokens in text manually, by trained linguists, or using algorithmic approaches. Particularly, in the case of annotated text corpora, the quantity of textual data makes it unfeasible to rely on manual tagging and automated methods are used extensively. The quality of such methods is o...
متن کاملImproving Morphosyntactic Tagging of Slovene Language through Meta-tagging
Part-of-speech (PoS) or, better, morphosyntactic tagging is the process of assigning morphosyntactic categories to words in a text, an important pre-processing step for most human language technology applications. PoS-tagging of Slovene texts is a challenging task since the size of the tagset is over one thousand tags (as opposed to English, where the size is typically around sixty) and the sta...
متن کامل